Visual field prediction using a deep bidirectional gated recurrent unit network model

Although deep learning architecture has been used to process sequential data, only a few studies have explored the usefulness of deep learning algorithms to detect glaucoma progression. Here, we proposed a bidirectional gated recurrent unit (Bi-GRU) algorithm to predict visual field loss. In total, 5413 eyes from 3321 patients were included in the training set, whereas 1272 eyes from 1272 patients were included in the test set. Data from five consecutive visual field examinations were used as input; the sixth visual field examinations were compared with predictions by the Bi-GRU. The performance of Bi-GRU was compared with the performances of conventional linear regression (LR) and long short-term memory (LSTM) algorithms. Overall prediction error was significantly lower for Bi-GRU than for LR and LSTM algorithms. In pointwise prediction, Bi-GRU showed the lowest prediction error among the three models in most test locations. Furthermore, Bi-GRU was the least affected model in terms of worsening reliability indices and glaucoma severity. Accurate prediction of visual field loss using the Bi-GRU algorithm may facilitate decision-making regarding the treatment of patients with glaucoma.

www.nature.com/scientificreports/ GRU uses gating units more efficiently and at a similar rate, compared with typical LSTMs [15][16][17] . Several studies have revealed that GRU has excellent performance for sequential data analysis, compared with other RNN types 12,15,18,19 . Recently, a bidirectional RNN method has been developed via simultaneous training with positive and negative time directions, which provides a better understanding of context 20 . Lynn et al. 15 compared several RNN-based models for human identification using electrocardiogram-based biometrics from sequential time-series data. The bidirectional network with LSTM and GRU models was more effective than conventional RNN models, and the bidirectional-gated recurrent unit (Bi-GRU) model exhibited performance superior to the bidirectional LSTM model. Because visual field examinations provide sequential data with extensive interconnections, Bi-GRU may achieve better prediction of visual field progression, compared with the previous LSTM-based RNN model.
To our knowledge, this is the first study to use Bi-GRU to predict visual field damage. In a previous study, we evaluated the performance of LSTM in predicting visual field defects. Because the present study used a larger dataset than our previous work, we developed a computationally efficient RNN-based Bi-GRU model. We compared the performance of the Bi-GRU model with the performances of conventional LR and LSTM models.

Materials and methods
This retrospective study was conducted in accordance with the tenets of the Declaration of Helsinki. Visual field data were collected from glaucoma clinics at Pusan National University Hospital, Kosin University Gospel The requirement for patient consent was waived by the institutional review boards because of the retrospective study design. Sex and diagnostic data were retrospectively collected from medical records.
Participants who completed a minimum of six consecutive visual field examinations were included in the training and test datasets. There was no patient overlap between the two datasets. Eyes with an interval of ≥ 3 years between the first and sixth visual field examinations were included. For example, in an eye with 13 consecutive visual field examinations, the first through sixth examinations were considered the first dataset, the seventh through twelfth examinations were considered the second dataset, and the thirteenth examination was excluded from the dataset. The first five examinations were used as input data to predict the sixth examination, and the seventh through eleventh examinations were used as input data to predict the twelfth examination (Fig. 1).
We obtained 6-cell data from 8323 visual fields of 6685 eyes and 4593 participants. Datasets from 7051 (85%) and 1,272 (15%) individuals were included in the training and test datasets, respectively. In total, 7051 records from the training dataset were randomly split into training and validation datasets at a ratio of 9:1. The validation dataset was used to determine the fitness of the neural network during training to prevent overfitting. All 8323 datasets included six visual field examinations, and the mean follow-up duration for the six examinations was 4.39 ± 1.69 years. Table 1 presents the characteristics of each dataset.
Visual field examination. Automated perimetry was conducted using a Humphrey Visual Field Analyzer 750i (Carl Zeiss Meditec, Inc., Dublin, CA, USA) and the 24-2 or 30-2 Swedish interactive threshold algorithm. Among the 54 test points of the 24-2 test pattern, the two points of physiological scotoma were excluded; the remaining 52 test points were used. The 30-2 test pattern was converted to the 24-2 test pattern using the overlapped test points. Reliable visual field tests were defined as a false positive rate < 33%, false negative rate < 33%, and fixation loss < 33%.
Artificial neural network. We used the LSTM and Bi-GRU neural network models. Python software (ver- LSTM and Bi-GRU . We built one-layer neural networks to learn the structural information of a specific dataset using preprocessed input. The LSTM cell-based neural networks were defined as follows: andb C represent the bias in the network, respectively, of the three gates and a memory cell. ⨂ is the elementwise product between two vectors. The sigmoid is the activation function used in the network, written as follows: The input and output gates regulate the flow of memory cell inputs and outputs throughout the network, while the forget gate is incorporated into the memory cell to transmit output information with high weights from the previous neuron to the next one. The information residing in the memory depends on the high activation results. If the input unit has high activation, information is stored in the memory cell. On the other hand, if the output unit has high activation, it passes the information to the next neuron. Input information with a high weight resides in the memory cell. Sigmoid and tanh are employed as the active functions for the gates. Here, h(t-1) represents the prior hidden layer units that add the weights of the three gates in an elementwise manner. After processing Eq. (4), (C) t indicates the current memory cell unit. Equation (5) shows the elementwise multiplication of the prior hidden unit outputs and previous memory cell unit. Nonlinearity is introduced through the tanh and sigmoid activation functions as shown in Eqs. (1)(2)(3)(4)(5). Here, t − 1 and t are the previous and current time steps.
GRU is a simplified variant of LSTM that only has two gates: the update gate, which comprises the input and forget gates, and the reset gate. It has no additional memory cell to retain information and can only control information inside the unit.
The update gate in Eq. (6) determines the extent of information updating. In Eq. (7), the rest gate is similar to the update gate; if the gate is set to zero, GRU reads the input sequences and forgets the previously calculated state. Furthermore, h t exhibits functionality identical to the recurrent unit, and h t of the GRU at time t represents linear interpolation among the current h t and previous h t−1 activation states in Eqs. (8) and (9). www.nature.com/scientificreports/ A Bi-GRU layer was formed by combining a forward GRU with a reverse-direction GRU. Both GRUs receive the same input but train in opposite directions, and their results are concatenated to produce the output. Deep hierarchical neural networks effectively capture specific functions and model dependencies of varying lengths 21 . Our experiments revealed that Bi-GRU outperformed other models on our datasets.
Proposed method and evaluation. In our proposed method, the deep learning model comprises input data, a one-time series neural network layer used for sequential predictions, and a dense layer. The neural network structures for LSTM and Bi-GRU are shown in Fig. 2. www.nature.com/scientificreports/ The single-layer time-series neural network consists of six parallel and connected LSTM or Bi-GRU cells. The detailed structures of the LSTM and GRU cells are presented in Supplementary Fig. S1a, b, respectively.
Each of the first five cells uses 108 features as input, including 52 total deviation values (TDVs), 52 pattern deviation values (PDVs), reliability data (such as false-negative and false-positive rates, fixation loss percentage), and time displacement value. To improve the performance of the deep learning model, the input data were normalized to a reasonable range. The TDV, PDV, and time displacement values were divided into sets of 50, 50, and 1000, respectively. Time displacement indicated the number of days from the most recent visual field examination. For example, if the most recent visual field examination has a time displacement of "0, " the visual field examination performed 1 month (− 31 days) prior to "0" has a time displacement of " − 31. " A negative sign in the time displacement value indicates that the examination was performed in the past. With respect to the 6 consecutive visual field input data elements, the last input data element used a unique format with positive time displacement (i.e., the point in the future that the user wishes to predict) and 107 zeros. Since the other data were set to 0, these unique inputs can specify the exact date which the user wishes to predict. A series of input data was arranged by reducing the time displacement value (i.e., from future to past) and then supplying this information to the neural network. Subsequently, the neural network layer was connected to the next single fully connected layer (dense layer) with 52 neurons. These neurons generated a final output of 52 TDVs, such that one neuron generated a single visual field test point.
Statistical analyses. The root mean square error (RMSE) and mean absolute error (MAE) of the TDV were used as accuracy metrics. The RMSE was calculated for each eye using the following equation: The MAE was calculated for each test point in the visual field of all eyes using the following equation: The RMSE and MAE of the LR, LSTM, and Bi-GRU models were calculated using the above formulas. Repeated measures one-way analysis of variance was performed to compare accuracy metrics among LR, LSTM, and Bi-GRU models. P < 0.05 (single comparison) and p < 0.017 (multiple comparisons) were considered indicative of statistical significance. Parametric and nonparametric tests (Spearman's correlation and simple LR analyses) were performed to compare variables. These tests were used to investigate prediction error trends according to various factors, including false positive rate, false negative rate, fixation loss percentage, and visual field mean deviation (MD). Table 2 shows the demographic characteristics of the test dataset. The most common diagnosis was primary open-angle glaucoma (47.68%). The mean prediction time (time interval between prediction and final visual field examination) was 1.00 ± 0.84 years ( Table 1). The mean RMSE and pointwise mean absolute error (PMAE) are shown in Table 3. Figure 3 presents representative examples of the PMAE in the visual field test.

Results
Bi-GRU exhibited better prediction performance, compared with LR and LSTM. The RMSEs of Bi-GRU, LR, and LSTM were 3.71 ± 2.42, 4.81 ± 3.89, and 4.06 ± 2.61 dB, respectively. There were statistically significant  www.nature.com/scientificreports/ differences in prediction errors among the three models (F = 42.94, p < 0.001). The RMSE was significantly lower for Bi-GRU than for the other two models (both p < 0.001).
The number of eyes binned according to RMSE prediction error is shown in Fig. 4. More than 50% of eyes had Bi-GRU prediction errors of ≤ 2 dB (530 eyes, 41.67%) and 2-3 dB (175 eyes, 13.76%). The corresponding LR prediction errors were ≤ 2 dB (329 eyes, 25.86%) and 2-3 dB (254 eyes, 19.97%), and the corresponding LSTM prediction errors were ≤ 2 dB (505 eyes, 39.70%) and 2-3 dB (165 eyes, 12.97%). Figure 5 shows the PMAE in the visual field. With respect to the 52 TDV points, Bi-GRU exhibited the lowest prediction error among the three models. Bi-GRU showed significantly better performance at 29 (red dots) and 49 (blue dots) points compared with LR and LSTM, respectively.     www.nature.com/scientificreports/ Table 4 shows the mean prediction error (RMSE) according to sectors of the visual field examination (Fig. 6). The 24-2 visual field was divided into the six sectors proposed by Garway-Heath et al., 22 based on optic nerve head anatomy (superotemporal, superonasal, temporal, nasal, inferotemporal, and inferonasal) [ Fig. 6b] and two sectors (central and peripheral) [Fig. 6c]. The prediction errors of Bi-GRU were significantly lower than the errors of LR and LSTM for all sectors (p ≤ 0.001).
The mean RMSE values binned according to various factors are listed in Table 5 and Fig. 7. The prediction error was significantly lower for Bi-GRU than for the other two models in terms of the false-positive rate,  www.nature.com/scientificreports/ false-negative rate, and fixation loss percentage (p ≤ 0.025). As the visual field MD increased, the RMSE prediction errors of all three models decreased. The correlation coefficients and LR analyses between the prediction error and various factors are presented in Table 6 and Fig. 8. For all models, RMSE was positively correlated with the false-negative rate and fixation loss percentage, whereas it was negatively correlated with visual field MD (all p ≤ 0.029) (Fig. 8).

Discussion
To the best of our knowledge, this study is the first to utilize the Bi-GRU architecture for predicting visual field loss. We compared the prediction of visual field loss using the Bi-GRU, LR, and LSTM models. The Bi-GRU model demonstrated the highest predictive accuracy among the three models. The overall prediction errors (RMSEs) of the LR, LSTM, and Bi-GRU models were 4.81 ± 3.89, 4.06 ± 2.61, and 3.71 ± 2.42 dB, respectively. The RMSE significantly differed between Bi-GRU and the other models (p < 0.001).
In the six sectors of the visual fields according to optic nerve head anatomy, as well as the central and peripheral visual field areas, Bi-GRU exhibited superior performance compared with the other two models (all p < 0.001). www.nature.com/scientificreports/ The predictive performance was negatively correlated with the false-negative rate and fixation loss percentage in all three models; however, Bi-GRU was least affected by reliability indices. A decrease in MD was associated with lower prediction performance in all three models. The RMSE was lowest for Bi-GRU among the three models; Bi-GRU performed better even in patients with advanced glaucoma.
Several studies have used artificial intelligence to detect glaucoma and its progression. Asaoka et al. 23 built a deep feed-forward neural network to detect preperimetric glaucoma. The area under the receiver operating characteristic curve (AUROC) of the model was 92.6%, indicating better performance than other machine learning methods (e.g., random forest, gradient boosting, support vector machine, and neural network). Although that study was the first to use deep learning for the evaluation of preperimetric glaucoma, only a small quantity of data from preperimetric visual fields of patients with glaucoma (53 eyes) were analyzed. Elze et al. 24 classified visual fields into 16 archetypes and found that the archetypes were closely correlated with the clinical features of glaucoma 25 . However, these studies classified visual fields, rather than predicting visual field changes. Yousefi et al. 26 compared various machine learning algorithms in terms of detecting glaucoma progression, using the retinal nerve fiber layer on optical coherence tomography and the MD and pattern standard deviation on visual field examination as input. The random forest classifier showed the best performance, with an AUROC of 0.88. Wang et al. 4 assessed the predictive ability for visual field changes using archetypes; they found that the mean hit and correct rejection rates were 0.77 and 0.77, suggesting that the predictive ability of the archetype approach was higher than the abilities of other methods, such as MD slope, advanced glaucoma intervention study scoring, collaborative initial glaucoma treatment study scoring, and the permutation of pointwise linear regression. However, unlike our study, previous studies did not predict visual field changes.
Dixit et al. 14 found that the progression of visual field changes using a deep learning algorithm based on LSTM architecture could be predicted with an accuracy of 91-93%. The AUROC was 0.89-0.93 when using multiple visual field examinations and baseline clinical data as input. Additionally, the use of clinical data to supplement the visual field data led to improved model performance. Murata et al. 5 found that variational Bayes linear regression more accurately predicted the progression of visual field changes in patients with glaucoma, compared with conventional least-squares LR. Wen et al. 6 used Cascade-Net, a type of convolutional neural network architecture, to predict future Humphrey visual field findings using only a single visual field input. The models showed excellent predictive abilities; the overall PMAE and RMSE were 2.47 and 3.47 dB, respectively. The PMAE and RMSE of the Bi-GRU model were slightly higher than the PMAE and RMSE of the Cascade-Net model. However, this model may not reflect true progression because the authors used single visual field examination as input. Berchuck et al. 7 used a generalized variational autoencoder algorithm to estimate progression rates and predict future visual fields. The overall MAE was 1.89-2.33 dB, comparable with the MAE of our model. Park et al. 13 used an RNN to predict the sixth visual field examination; they found that the RMSE was 4.31 ± 2.4 dB, indicating that RNN predicted future visual field better than LR.
In a previous study, we used the LSTM model to analyze time-sequential input consisting of visual field examinations 13 . In the present study, we built a deep learning architecture based on a Bi-GRU network. Both GRU and LSTM are variants of RNN, a state-of-the-art deep learning architecture that processes sequential data for sequence recognition and prediction 27 . Cho et al. 16 presented a GRU architecture that allowed each recurrent unit to adaptively capture dependencies of different time scales. Both GRU and LSTM have recurrent units in sequence modeling. However, GRU has gating units that modulate the flow of information inside the unit without Table 6. Correlation coefficients and linear regression analyses between prediction error and reliability, and between prediction error and visual field mean deviation. LR linear regression, LSTM long short-term memory, Bi-GRU bidirectional gated recurrent unit.  20 proposed a bidirectional RNN that considers both past and future input sequences to estimate the output vector. Several studies have shown that Bi-GRU outperforms LSTM 15,17,18 . Bi-GRU achieved the highest classification accuracy among deep neural network-based models for human identification based on electrocardiogram biometrics 15 .
In the present study, Bi-GRU exhibited better predictive performance than LR and LSTM for the entire visual field, as well as the central area; this area is important because the preservation of central visual function has a strong effect on quality of life in patients with glaucoma 28,29 . Bi-GRU was least affected by reliability indices. The false-negative rate and fixation loss affected visual field prediction in all models. However, there was poor correlation between fixation loss and visual field prediction, indicating a small effect of fixation loss. Previous studies showed that false-negative rates, but not fixation loss, were associated with visual field assessment 13,30,31 . www.nature.com/scientificreports/ Moreover, previous studies revealed that false-negative rates were the most common cause of unreliable visual field classification 32,33 .
Our study had several limitations. First, the study results cannot be fully generalized to patients with different degrees of glaucoma severity. The study included a greater number of patients with early glaucoma (MD > − 6 dB) in the training and test datasets, compared with patients who had advanced glaucoma. Although this difference may have affected the performance of Bi-GRU model learning, it reflects the distribution of glaucoma severity observed in clinical practice.
Second, we did not include clinical data for training, in contrast to the work by Dixit et al. 14 Future studies should improve deep learning architecture by adding clinical characteristics to the input data.
Third, we trained and tested the model using five consecutive visual field data elements as input. Glaucoma specialists recommend that at least five serial visual field examinations are used to detect glaucoma progression. The Glaucoma Progression Analysis included in the Humphrey Visual Field Analyzer requires at least five reliable visual field examinations and a follow-up period of 2 years 34 . Previous studies also used five visual field data elements as input to predict visual field progression in glaucoma 35,36 . Additionally, sequential pointwise LR was performed with at least four visual field examinations because regression analysis is unlikely to detect a trend when fewer data are available 37 . We predicted the sixth visual field examination using the previous five examinations to compare the predictive performances of Bi-GRU and LR models. Glaucoma requires lifelong periodic visual field examinations 38,39 . Thus, five consecutive visual field examinations over 3 years are not an excessively frequent number, and the prediction of subsequent examinations based on the initial five examinations may enhance patient convenience.
On further analysis, we predicted future visual field based on four consecutive visual field data elements using the Bi-GRU model. The mean prediction errors were 3.84 ± 2.48 and 2.91 ± 1.96 dB for RMSE and PMAE, respectively. Although there were statistically significant differences in prediction errors (both p < 0.001) between the models using five and four visual field data elements, the difference was not clinically significant. Fourth, the model could only predict the sixth visual field examinations. Future studies should collect additional patient data with a greater number of visual field examinations and evaluate the performance of our model in terms of predicting the seventh through tenth visual field examinations, using the first five visual field examinations as input. However, our model can forcast visual fields at future time points. For example, the model can predict the visual fields at 4, 8, and 12 months after the fifth visual field examination.
In summary, a deep learning architecture using the Bi-GRU model, a variant of RNN, predicts future visual field examinations significantly better than the pointwise LR and LSTM models. The Bi-GRU model is less affected by the reliability indices of visual field input data. This model may facilitate decision-making by accurately predicting future visual field examinations in clinical practice, particularly for patients who experience difficulty with repeated examinations.

Data availability
The data generated or analyzed during this study are available from the corresponding author (J.R.P.) upon reasonable request.